Warning: Table './interoperating/mark_sessions' is marked as crashed and should be repaired query: SELECT u.*, s.* FROM mark_users u INNER JOIN mark_sessions s ON u.uid = s.uid WHERE s.sid = 'a4ktdnlgq7flj2jd7kmpvnv2l5' in /home/.sites/99/site76/web/interoperating.info/mark/includes/database.mysql.inc on line 174
Mark's web presence | nothing very exciting, just some guy's web presence

last_indexed details

Hi Tim,

The last_indexed field contains a 0 if the file is newly added to the search_attachments_files table, or contains a timestamp that indicates the time at which the file was successfully indexed. This field is updated when the file's text is extracted by the helper and just before it is passed off to Drupal's search_index() function, which actually updates the index (so it should more properly be called 'last_extracted' not 'last_indexed'). If the text that is extracted is not '', the last_indexed field gets updated with the current timestamp; if it is '', the field gets updated with a zero.

The information you provide is very useful. What I think is happening is that only part of the text of Helpdesk Files/STC Main Menu.pdf is being extracted (so my test to see if the text is '' or not is passing and the value of last_indexed is being updated) but that the extraction fails before all of the text has been extracted. We need to figure out why all of the text is not being extracted. Let me think about this for a bit. I think I would like to add a new field to the search_attachments_files table that holds the extracted text. This would record where files like this one fail, and also allow for alternative extraction schedules, i.e., external helper scripts that update only the search_attachments_files table but don't screw with the search_dataset and search_index tables. This feature might not make it into 5.x-4 since I've really got to get that version out, but it would make it into the next version for sure (which might be the first 6.x version).

In response to your queries:

1. How many passes does it take to index a file ?

One. If it can't be indexed in one pass, we'd probably see symptoms like the ones you have documented.

2. Do multiple passes of a file occur in order to index it ?

No, the module assumes that it can extract the text from a file and hand that text off to Drupal for indexing in one pass. It's a one-to-one relationship.

3. What tells search_attachments to reindex a file ? I am assuming there is a check against a modified flag ? What situation would also cause it to do multiple passes of the same file in the same cron run ?

A file is reindexed if its last_changed value is greater than its last_indexed value. When cron.php runs, it invokes search_attachments twice: once to update the search_attachments_files table by checking the last updated times of all the files listed (also add any new files or remove any files that have been deleted from the file management modules or from the file system), and once via cron's normal invocation of Drupal's search_attachments_update_index(), which fires hook_update_index() in all modules that call it.

The module doesn't (intentionally, anyway) perform multiple passes of the same file in the same cron run.

Re. the distinct error, did you get my email where I said I logged into your server?

Mark

Reply

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.