Development with Very Large SOQL Queries

Deep Banerjee
6 min readSep 4, 2022

If you’re doing development in an org with a Large Data Volume then you must be careful with SOQL

For best performance, SOQL queries must be selective, particularly for queries inside triggers. To avoid long execution times, the system can terminate nonselective SOQL queries. Developers receive an error message when a non-selective query in a trigger executes against an object that contains more than 1 million records. To avoid this error, ensure that the query is selective.

Selective SOQL Query Criteria

  • A query is selective when one of the query filters is on an indexed field and the query filter reduces the resulting number of rows below a system-defined threshold. The performance of the SOQL query improves when two or more filters used in the WHERE clause meet the mentioned conditions.
  • The selectivity threshold is 10% of the first million records and less than 5% of the records after the first million records, up to a maximum of 333,333 records. In some circumstances, for example with a query filter that is an indexed standard field, the threshold can be higher. Also, the selectivity threshold is subject to change.

Custom Index Considerations for Selective SOQL Queries

  • The following fields are indexed by default.
  • Primary keys (Id, Name, and OwnerId fields)
  • Foreign keys (lookup or master-detail relationship fields)
  • Audit dates (CreatedDate and SystemModstamp fields)
  • RecordType fields (indexed for all standard objects that feature them)
  • Custom fields that are marked as External ID or Unique

Examples of Selective SOQL QueriesTo better understand whether a query on a large object is selective or not, let’s analyze some queries. For these queries, assume that there are more than 1 million records for the Account sObject. These records include soft-deleted records, that is, deleted records that are still in the Recycle Bin.Query 1:

SELECT Id FROM Account WHERE Id IN (<list of account IDs>)

The WHERE clause is on an indexed field (Id). If SELECT COUNT() FROM Account WHERE Id IN (<list of account IDs>) returns fewer records than the selectivity threshold, the index on Id is used. This index is typically used when the list of IDs contains only a few records.

Query 2:

SELECT Id FROM Account WHERE Name != ''

Since Account is a large object even though Name is indexed (primary key), this filter returns most of the records, making the query non-selective.

Query 3:

SELECT Id FROM Account WHERE Name != '' AND CustomField__c = 'ValueA'

Here we have to see if any filter, when considered individually, is selective. As we saw in the previous example, the first filter isn’t selective. So let’s focus on the second one. If the count of records returned by SELECT COUNT() FROM Account WHERE CustomField__c = ‘ValueA’ is lower than the selectivity threshold, and CustomField__c is indexed, the query is selective.

SOQL FOR LOOP

if the results are too large, this below syntax causes a runtime exception:

Account[] accts = [SELECT Id FROM Account];

nstead, use a SOQL query for loop as in one of the following examples:

// Use this format if you are not executing DML statements 
// within the for loop
for (Account a : [SELECT Id, Name FROM Account
WHERE Name LIKE 'Acme%']) {
// Your code without DML statements here
}
// Use this format for efficiency if you are executing DML statements
// within the for loop
for (List<Account> accts : [SELECT Id, Name FROM Account
WHERE Name LIKE 'Acme%']) {
// Your code here
update accts;
}

The following example demonstrates a SOQL query for loop that’s used to mass update records. Suppose that you want to change the last name of a contact in records for contacts whose first and last names match specified criteria:

public void massUpdate() {
for (List<Contact> contacts:
[SELECT FirstName, LastName FROM Contact]) {
for(Contact c : contacts) {
if (c.FirstName == 'Barbara' &&
c.LastName == 'Gordon') {
c.LastName = 'Wayne';
}
}
update contacts;
}
}

Despit this all , the preferred method of mass updating records is to use batch Apex, which minimizes the risk of hitting governor limits.

For more information, see SOQL For Loops.

Using Skinny Tables:

Say you’ve followed coding best practices and worked with Salesforce Customer Support to place custom indexes wherever appropriate, but you’re still encountering performance problems. Users are complaining about their reports and dashboards timing out, and the SOQL called from your Visualforce page is performing slower and slower. If you desperately need to further improve performance, there’s a special, powerful solution: skinny tables.

A skinny table is a custom table in the Force.com platform that contains a subset of fields from a standard or custom base Salesforce object. Force.com can have multiple skinny tables if needed, and maintains them and keeps them completely transparent to you.

By having narrower rows and less data to scan than the base Salesforce object, skinny tables allow Force.com to return more rows per database fetch, increasing throughput when reading from a large object, as this diagram shows:

Also, skinny tables don’t include soft-deleted rows (i.e., records in the Recycle Bin with isDeleted = true), which often reduces the table volume. Custom indexes on the base table are also replicated, and they usually perform better because of the reduced table joins that happen in the underlying database queries.

Here’s an example of how a skinny table can speed up queries. Instead of using a date range like 01/01/16 to 12/31/16 — which entails an expensive, repeated computation to create an annual or year-to-date report — you can use a skinny table to include a Year field and to filter on Year = ‘2016.’

The Force.com platform automatically synchronizes the rows between the base object and the skinny table, so the data is always kept current. The Force.com platform determines at query runtime when it would make sense to use skinny tables, so you don’t have to modify your reports or develop any Apex code or API calls.

Skinny tables are most useful with tables containing millions of records. They can be created on custom objects, and on Account, Contact, Opportunity, Lead, and Case objects. And they can enhance performance for reports, list views, and SOQL.

Skinny tables can be a convenient means of remedying performance issues. But they might not accommodate all use cases, or improve performance more than reading from the base Salesforce object with efficient indexes. They come with side effects you should understand because they might restrict or burden your business processes.

Polymorphic Relationships in SOQL Queries:

SELECT Id, Status, Subject, TYPEOF who
WHEN Contact THEN Name
WHEN Lead THEN Name, Phone
END
FROM Task

If you look at the query, I used a keyword “TYPEOF” which is used to get the type of value based on field that we are using. Here we are using “TYPEOF Who”, This means based on whoId if it is related to contact then we are getting the Name or If the whoId is related to Lead then we are getting the Name and Phone.

Example:

// List to fetch the Task records based on WhoId
List<Task> taskList = [SELECT Id, Status, Subject, TYPEOF who
WHEN Contact THEN Name
WHEN Lead THEN Name, Phone
END
FROM Task];
// Loop to iterate over the query results
for(Task task : taskList) {
if(task.Who instanceof Contact) {
Contact con = task.Who;
System.debug('Task Id: ' +task.Id+ 'related to Contact Name' +con.Name);
}else if(task.who instanceof Lead) {
Lead lead = task.who;
System.debug('Task Id ' + task.Id +' related to Lead Name ' + lead.Name);
}
}

Here we are using the instanceof keyword in our if and else if conditions. Instanceof keyword will help us to verify the right side of the keyword to the left of declared type in the expression.

Thanks for Reading !!

--

--