Schema Prompting

Estimated reading: 8 minutes 572 views

Lumenore introduces Schema-Level Prompting, enabling users to create conversational prompts that define how the AI should process data. For instance, users can specify instructions such as, “When I mention ‘yearly sales,’ use the order date year column from my schema,” or “When I say ‘gain,’ always reference the profit column.”

These prompts can include detailed instructions about goals, constraints, and data interpretations, ensuring that the AI processes data according to the user’s specific needs. Once defined, schema-level prompts are seamlessly integrated into the system prompt of AI modules, enabling the AI to operate with the customized context provided by the user.

By empowering users to set schema-level instructions, this feature enhances the relevance and accuracy of AI-generated insights. It increases the flexibility of AI modules to adapt to specific business requirements, improving user satisfaction and ensuring more contextually accurate responses based on predefined schema rules.

It accounts for 20% of the total score.

Guideline

You can provide detailed descriptions that clarify the context and behavior of the schema. For example, you can tell the AI that the schema contains “retail sales and profitability data across different regions.”
You can also use variable definitions and specify rules for columns.

Example Schema-Level Prompts

When defining schema-level prompts, users should ensure that the instructions are detailed and provide adequate context for the AI to act upon. Below are some example prompts:

Retail Sales:

Schema Preference:
“When I mention ‘month’, always use the ‘order_date_month’ column instead of ‘ship_date_month’.”
New Variable Creation:
“When I say ‘EBITDA’, always calculate it like ‘Revenue – COGS – Operating Expenses’, which is tracked in the ‘net_revenue’, ‘cogs’, and ‘operating_expenses’ columns.”
Date Preference:
“When I say ‘sales performance’, always use ‘order_date’ for the time-based analysis rather than ‘ship_date’, since it better reflects when the sale actually occurred.”
Schema Description:
“This schema contains data about sales, profit, and performance by channel and category, including product name, region, and sales agent.”

Things You Can Do:

Use conversational language with clear, detailed instructions.
Provide a full description of how the prompt should behave.
Specify the exact goal you want the AI to help users achieve.
Include relevant context, such as specific tables or data sources, for precise actions (e.g., “yearly sales from the yearly table”).

Things to Avoid:

Avoid single-word responses as they lack context.
Don’t leave the instructions vague or ambiguous, as this can confuse the AI.
Avoid brevity that may limit the AI’s ability to perform accurately.

Do’s and Don’ts Example:

Do’s

Use conversational language with clear, detailed instructions.
Provide full descriptions of how the prompt should behave.
You can provide detailed information about your schema, like “This schema contains retail sales and profitability data across channels, geographies, and categories.”
Include relevant context, such as specific tables or data sources for precise actions (e.g., ‘yearly sales from the yearly table’).
Use Variable definitions: e.g., ‘Calculate EBITDA like …’
Use Synonyms Suggestions: e.g., ‘When I say revenue, refer to the sales column.’
Specify Minifier Suggestions: e.g., ‘ASOT should always reference the Top Sales field.’
Mention Column Suggestions: e.g., ‘When I say month, pick the column of order date month.’

Don’ts

Avoid being overly brief; it can limit accuracy.
Don’t leave instructions vague; it can confuse the AI.
Don’t apply row-level security in prompts.
Avoid defining exact filters; they may not work at the schema level.
Keep the scope at the schema level; don’t specify question-level details. Ex: When asked about sales, break down the results by location and salesperson.
Don’t use keywords from the data dictionary’s synonyms or minifiers. Their rules always take precedence over schema prompt rules.

Additional Notes:

Schema prompts should be designed at the schema level, not specific to questions or filters.
Avoid using keywords from the data dictionary, as it may conflict with the prompt’s rules.

Best Practices for Writing Column Descriptions in Data Dictionary

When writing column descriptions in your data dictionary, it is important to ensure that the descriptions guide the AI modules such as Ask Me, AI dashboard, Insights board etc. to properly interpret columns. Descriptions should define not only the meaning of the data but also specify when and how the data can be used, potential synonyms, and how to aggregate it.

Note: These are suggestive guidelines for the AI models. The synonyms, units of measurement, and definitions specified in the data dictionary will always take precedence over these descriptions.

Below are best practices for writing column descriptions:

1. Mention Synonyms and Alternative Terms

In the column description, you can include synonyms or alternative terms that users may use when asking questions. This ensures that the AI can recognize different phrasing and still retrieve the correct column.

Example:

Column Name: PROFIT
- Description: Use this PROFIT column when the user asks about ‘profits’, ‘earnings’, ‘revenue gains’, or ‘margins’.
- Synonyms: ‘earnings’, ‘revenue gains’, ‘margins’.

Use Case:

User asks: “What are the total earnings for this quarter?”
- Response: The AI should use the PROFIT column and show a chart for Profit by quarter.

2. Specify Column’s Role in Aggregations

Make it clear when the column should be used for aggregation (sum, average, count, etc.) and what type of aggregation it is suitable for. This helps the AI in responding to queries where users ask for summaries or grouped data.

Example:

Column Name: SALES
- Description: This column represents the total sales amount for each transaction. Always use Sum(sales) when user asks a question about sales, unless user specifies some other aggregation in the question.
- Aggregation: Sum

Use Case:

User asks: “What was the sales amount for the last week?”
- Response: The AI should aggregate the SALES column as Sum(sales).

3. Clarify When Not to Use the Column

If a column should not be used in specific situations (e.g., ambiguous column names DSAT, MSAT etc.), mention it in the description. This helps the AI avoid errors and provides more accurate results.

Example:

Column Name: DSAT
- Description: When user asks a question about ‘DSAT’ or Dissatisfaction’, use this DSAT column. Do not use this when user asks about MSAT.

Use Case:

User asks: “What is the DSAT by category?”
- Response: AI should show a chart for DSAT column by category and not mistake it with MSAT column.

4. Define Temporal Usage (Date/Time)

Columns related to dates or time should have clear guidance on how to use them for time-based queries, such as year, quarter, month, or specific timeframes.

Example:

Column Name: ORDER_DATE
- Description: This column refers to the date when the order was placed. When user refers to date always use this ORDER_DATE column.
- Synonyms: “Order Timestamp”, “Transaction Date”

Use Case:

User asks: “How many orders were placed by date?”
- Response: The AI should show the data by the ORDER_DATE column.

Example 2:

Column Name: Survey Month
- Description: Month of the survey or activity. This could refer to the current, latest, or previous month.

Use Case:

User asks: “How many surveys were conducted in the current month?”
- Response: The AI should show the surveys by the latest Survey month.

5. Clarify Units of Measurement

For columns that involve units of measurement, clearly state the units to avoid confusion and ensure accurate query responses.

Example:

Column Name: PRODUCTION_TIME
- Description: Use this column to refer to the total time taken for production in minutes. The unit of measurement is minutes.
- Unit: Minutes

Use Case:

User asks: “What is the total production time for this month?”
- Response: The AI should sum up the PRODUCTION_TIME column in minutes.

6. Explain Aggregation Limitations

Clearly mention any limitations or conditions when using a column for aggregation. This helps the AI avoid incorrect or misleading aggregations.

Example:

Column Name: CALL_DURATION
- Description: This column tracks the duration of each call in seconds. When user asks Call duration, always use sum (call duration) unless specified explicitly by user. Do not use Count or Count distinct for this column.

Use Case:

User asks: “What is the total call duration for this week?”
- Response: The AI should aggregate the sum(CALL_DURATION) column and not count or count distinct.

7. Ensure Descriptions are Context-Aware

Tailor descriptions to the industry and use case.

Example:

Retail Column Name: SALES_REVENUE
- Description: Use this SALES_REVENUE column to refer to the total sales revenue. Common in retail environments for analyzing sales performance.
Healthcare Column Name: PATIENT_WAIT_TIME
- Description: Use this PATIENT_WAIT_TIME column to refer to the time a patient waits before being seen by a healthcare provider.

Use Case:

User asks: “What is the total revenue for the last week?” (Retail)
- Response: The AI should use the SALES_REVENUE column to calculate total revenue.
User asks: “What is the average patient wait time for this month?” (Healthcare)
- Response: The AI should use the PATIENT_WAIT_TIME column to calculate wait time.

By following these best practices when writing column descriptions, you ensure that AI modules such as Ask Me, AI dashboard, Insights board in Lumenore will provide accurate, relevant, and context-sensitive answers for users. The goal is to make sure the AI understands the appropriate use of each column, the right way to aggregate data, and how to interpret user queries across a wide range of industries and functions.

Schema Prompting

Best Practices for Writing Column Descriptions in Data Dictionary

Leave a Reply Cancel reply

CONTENTS

Product

Resources

Company

Read More