The great new crawler I was waiting for :)




Wordpress LScache Plugin: The great new crawler I was waiting for :)

Last Updated on: Wed, 15 Apr 2026 00:00:02
Hi (please excuse typos, Im not English) I have several question about this new and great feature: #1 Is it always activated? I have not seen any ON/OFF button #2 How you plugin find my sitemap? #2 Delay: If I understand well this is the crawl delay between each URL? #3 Threads: Set up to 3 means that you are using 3 crawls at the same time? #4 Run Duration: what is a craw interval ? By default it is setup to 200s #5 Server Load Limit: is is setup to 1 by default, but what 1 means? If I asked you all those question its because, like a lot of people, Im on a shared server that dont like to be overloaded. Before going to you plugin I was using WP Rocket and this one caused a lot of server crash because it was pulling too hard my shared server. Thanks Edit: Im using WP-Cron Events plugin to manage WP Cron, but I do not see your task? This topic was modified 5 years, 5 months ago by pako69. Hi Pako69, Glad you are excited too :). #1 By default its off. You can turn it on in LiteSpeed Cache -> Crawler . Under Activation there is a beautiful switch button ^_^. #2 It will generate the sitemap based on posts and pages in your database following by your permalink setting. If you have any other specified urls, you can use filter litespeed_crawler_sitemap to append them to sitemap. #3 That is true, only one thing, its not for each URL but for each URL with threads. e.g. if you have 3 threads currently, it will crawl 3 urls simultaneously, and then sleep for the Delay microseconds. As long as there is only 1 thread, your understanding is right. #4 That means each time the crawler will only run as long as 200s, and then exit until next time cron runs. #5 That server load is based on linux server load. A completely idle computer has a load average of 0. Each running process either using or waiting for CPU resources adds 1 to the load average. We designed this server load and dynamic threads setting exactly for the overload issue. When the crawler is running, if the server load is higher than setting, it will stop running automatically. So it wont cause the server crash issue like the other plugin you mentioned. E.g. Assuming settings are below: Server load = 5 Threads = 4 Server load = 2 when the crawler starts running. Here is the process: Crawler is crawling 4 urls each time, then it found server load >= 5, it will reduce the threads to 3 and keep crawling. Then overload again, it will reduce threads to 2 and go on. If the thread is only 1 and still overload, it will exit, otherwise it will increase threads one per time then crawl again. Max threads it will be raised to is 4. We are always here to answer your questions. Its our pleasure. Cheers BTW, you didnt see the cron due to deactivation in crawler list as mentioned in #1. Hi @hailite thanks for all those explanations #1 My bad!I didnt see this tab However, Im little bit lost to setup correctly, imagine I want the cache to be generated every 72 hours. Where do I setup those 72 hours? Because I see a Crawl Interval setup to 604800 and a Interval Between Runs setup to 28800 and it seems that the crawler use 28800 to run. #2 If I understand, your sitemap is a fake sitemap, I mean it will be generated and use only for your own crawl? Google never see it? Maybe is will be (in a next release) to add support to crawl the sitemap index generated by Yoast SEO, dont you think, because this one contains only what I want to be crawl bar Google, and it means I do not need anything else to be cached (but its only my point of view?) #5 What setting do you recommanded for a shared server? I Know that they are not all the sames, but just to give an idea? Thanks for this wonderfull plugin ! (used with Autoptimze plugin, its a great replacement for all plugins and premium plugins EDIT: I think I do not have setup it correctly? > https://s26.postimg.org/yrqleyqux/Capture_d_e_cran_2017-06-07_a_16.09.04.jpg re? When I pushed On the beautiful switch button ^_^ and then go to see the cron task, yes I see your cron task but also an error msg: https://s26.postimg.org/y3hqw0s55/cron.jpg So I swith it Off and return to the cron list and the error and the error has disappeared #1 well the current settings for crawl interval may be a bit misleading. We will change the default value in next hotfix release. If you want whole sitemap be crawled each 72 hours, yes, you need to set that Crawl Interval to 72hours. #2 Cant agree more. The feature that allows users to customize sitemap is already in our development schedule. We will teach the crawler to read Google friendly sitemap. #5 As mentioned in #1, they will be changed. However as you have saved the settings once, the default value wont be used anymore. The new default setting is: Run Duration => 400 seconds Interval Between Runs => 600 seconds Crawl Interval => 302400 seconds (Your 72hours is 259200, should be better) Yay, the crawler is nice! Just one question: I noticed that my portfolio custom post type is not being added to the crawler sitemap. The CPT is found by LS Cache, it is visible in the Available Custom Post Types list on Settings > Crawler, but its not in the crwalermap.data file, and generates LiteSpeed Cache Misses. Could you give some more infos as to you can use filter litespeed_crawler_sitemap to append them to sitemap ? Thanks, Phil well the current settings for crawl interval may be a bit misleading I agree with you, maybe you should keep it the more simple as possible if you want non-tech people to use it. > For exemple, just one field: Cache purge frequency: xxx (hours/days/months) But not all those settings for the Crawler, and for the cache (ttl cache, etc.) Just my 2 cents? Hello and thank you for adding the crawler to your plugin. One question please: When manually ordering the crawling does the specific page inside the crawler has to stay on that page durring the crawl? Could you also please set a guide about best settings for a shared host website? Thanking you in advance Georgios http://www.ango.gr @speango No you dont have to stay on that page even you click Manually Run. For shared hosting, you can try these: Run Duration => 400 seconds Interval Between Runs => 600 seconds Crawl Interval => 302400 seconds keep all others as default. @philbee https://github.com/litespeedtech/lscache_wp/blob/v1_0_x/litespeed-cache/includes/class-litespeed-cache-crawler-sitemap.php#L136 When the sitemap is generating, before saving, it will call this filter litespeed_crawler_sitemap. So if any other plugin author or user wants to add a certain url list for crawler, they can use add_filter(litespeed_crawler_sitemap, append_your_list_sample_function) to enroll new urls. About your portfolio custom post type, we may need more debug info. Please wait for next hotfix release. @hailite I have setup like you said for shared server and do a manual run: ? LiteSpeed Cache Crawler The last sitemap crawl began at 06/07/2017 14:03:08 The next sitemap crawl will start at 06/11/2017 02:03:08 Ended reason: Stopped due to exceeding defined Maximum Run Time Last crawled: 40 item(s) ? So, I understand that the crawl did not finished his job , and the next scheduled time it will try will in 4 days ?! Ouchhh thank you @pako69 That is fine. That 4 days is for a whole new crawling process. The current unfinished process will base on the Run Frequency column. @speango You are most welcome. Hi, @pako69 and all! Our wiki has been updated to describe all of the crawler settings. Hopefully it can shed some light on any questions you may still have! And, as always, if you still find yourself puzzled after taking a look at the wiki, wed be happy to help you right here Lisa @ LiteSpeed ]



LiteCache Rush: Speed comes from using less, not from doing it faster



Reference