You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

204 lines
6.9 KiB

/**
* Marlin 3D Printer Firmware
* Copyright (C) 2016 MarlinFirmware [https://github.com/MarlinFirmware/Marlin]
*
* Based on Sprinter and grbl.
* Copyright (C) 2011 Camiel Gubbels / Erik van der Zalm
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/
#ifndef ULCDST7920_H
#define ULCDST7920_H
#include <U8glib.h>
#define ST7920_CLK_PIN LCD_PINS_D4
#define ST7920_DAT_PIN LCD_PINS_ENABLE
#define ST7920_CS_PIN LCD_PINS_RS
//#define PAGE_HEIGHT 8 //128 byte framebuffer
Distribute GLCD screen updates in time Currently we draw and send the screens for a graphical LCD all at once. We draw in two or four parts but draw them directly behind each other. For the tested status screen this takes 59-62ms in a single block. During this time nothing else (except the interrupts) can be done. When printing a sequence of very short moves the buffer drains - sometimes until it's empty. This PR splits the screen update into parts. Currently we have 10 time slots. During the first one the complete screen is drawn. (60,0,0,0,0,0,0,0,0,0,0) Here i introduce pauses for doing other things. (30,30,0,0,0,0,0,0) or (15,15,15,15,0,0,0,0,0,0) Drawing in consecutive time slots prevents from lagging too much. Even with a 4 stripe display all the drawing is done after 400ms. Previous experiments with a even better distribution of the time slots like (30,0,0,0,0,30,0,0,0,0) and (15,0,15,0,15,0,15,0,0,0) did not feel good when using the menu, because of too much lag. Because of the previous PRs to speed up the display updates and especially reducing the difference between drawing 2 or 4 stripes, it now makes sense for the REPRAP_DISCOUNT_FULL_GRAPHIC_SMART_CONTROLLER to go from 2 to 4 stripes. This costs about 1-2ms per complete screen update, but is payed back by having partial updates lasting only the half time and two additional brakes. Also ~256 byte of framebuffer are saved in RAM. 13:45:59.213 : echo: #:17 >:13 s:30; #:16 >:13 s:29; S#:33 S>:26 S:59 13:46:00.213 : echo: #:16 >:14 s:30; #:17 >:13 s:30; S#:33 S>:27 S:60 13:46:01.215 : echo: #:17 >:13 s:30; #:16 >:13 s:29; S#:33 S>:26 S:59 13:46:02.215 : echo: #:16 >:13 s:29; #:16 >:14 s:30; S#:32 S>:27 S:59 13:46:03.214 : echo: #:17 >:13 s:30; #:17 >:13 s:30; S#:34 S>:26 S:60 13:46:04.214 : echo: #:16 >:13 s:29; #:16 >:14 s:30; S#:32 S>:27 S:59 13:46:05.212 : echo: #:16 >:14 s:30; #:17 >:13 s:30; S#:33 S>:27 S:60 13:46:06.212 : echo: #:17 >:13 s:30; #:16 >:13 s:29; S#:33 S>:26 S:59 03:30:36.779 : echo: #:8 >:7 s:15; #:10 >:7 s:17; #:8 >:6 s:14; #:8 >:7 s:15; S#:34 S>:27 S:61 03:30:37.778 : echo: #:8 >:6 s:14; #:10 >:7 s:17; #:9 >:7 s:16; #:8 >:6 s:14; S#:35 S>:26 S:61 03:30:38.778 : echo: #:8 >:6 s:14; #:11 >:7 s:18; #:8 >:6 s:14; #:8 >:7 s:15; S#:35 S>:26 S:61 03:30:39.777 : echo: #:8 >:6 s:14; #:10 >:7 s:17; #:8 >:8 s:16; #:8 >:6 s:14; S#:34 S>:27 S:61 03:30:40.780 : echo: #:8 >:6 s:14; #:11 >:7 s:18; #:8 >:6 s:14; #:8 >:6 s:14; S#:35 S>:25 S:60 03:30:41.780 : echo: #:9 >:6 s:15; #:10 >:7 s:17; #:8 >:6 s:14; #:9 >:6 s:15; S#:36 S>:25 S:61 03:30:42.779 : echo: #:8 >:6 s:14; #:10 >:8 s:18; #:8 >:6 s:14; #:8 >:6 s:14; S#:34 S>:26 S:60 03:30:43.778 : echo: #:9 >:6 s:15; #:10 >:7 s:17; #:8 >:7 s:15; #:9 >:6 s:15; S#:36 S>:26 S:62 #: draw a stripe >: transfer a stripe s: sum of of draw and transfer for one stripe S#: sum of draws for a complete screen S>: sum of transfers for a complete screen S: time to draw and transfer a complete screen
8 years ago
#define PAGE_HEIGHT 16 //256 byte framebuffer
//#define PAGE_HEIGHT 32 //512 byte framebuffer
#define LCD_PIXEL_WIDTH 128
#define LCD_PIXEL_HEIGHT 64
3 ms speedup for ST7920 and delay for BOARD_3DRAG and saving ~1k memory by limiting the `#pragma GCC optimize (3)` optimisation to `ultralcd_st7920_u8glib_rrd.h`. These optimisation was and is not done for all the other displays, is the reason for the big additionally use of memory, because the complete 'ultralcd.cpp' and 'dogm_lcd_implementation.h' was optimised (sadly i did not observe a change in speed). Unrolling the loop in `ST7920_SWSPI_SND_8BIT()`, what i expected the optimiser to do, by hand, saved some speed by eliminating the loop variable (i) compares and increases. Every CPU cycle in this loop costs at least 0.5ms per display update because it's executed more than 1k times/s. The delays are now pre-filled with the calculated values for 4.5V driven ST7920. A way to simply add __your__ timing into the configuration was made. At 4.5V 1.) The CLK signal needs to be at least 200ns high and 200ns low. 2.) The DAT pin needs to be set at least 40ns before CLK goes high and must stay at this value until 40ns after CLK went high. A nop takes one processor cycle. For 16MHz one nop lasts 62.5ns. For 20MHz one not lasts 50ns. To fulfill condition 1.) we need 200/62.5 = 3.2 => 4 cycles (200/50 = 4 => 4). For the low phase, setting the pin takes much longer. For the high phase we (theoretically) have to throw in 2 nops, because changing the CLK takes only 2 cycles. Condition 2.) is always fulfilled because the processor needs two cycles (100 - 125ns) for switching the CLK pin. Needs tests and feedback. Especially i cant test 20MHz, 3DRAG and displays supplied wit less than 5V. Are the delays right? Please experiment with longer or shorter delays. And give feedback. Already tested are 5 displays with 4.9V - 5.1V at 16MHz where no delays are needed.
8 years ago
//set optimization so ARDUINO optimizes this file
#pragma GCC optimize (3)
// If you want you can define your own set of delays in Configuration.h
//#define ST7920_DELAY_1 DELAY_0_NOP
//#define ST7920_DELAY_2 DELAY_0_NOP
//#define ST7920_DELAY_3 DELAY_0_NOP
#if F_CPU >= 20000000
#define CPU_ST7920_DELAY_1 DELAY_0_NOP
#define CPU_ST7920_DELAY_2 DELAY_0_NOP
#define CPU_ST7920_DELAY_3 DELAY_1_NOP
#elif MB(3DRAG) || MB(K8200) || MB(K8400) || MB(SILVER_GATE)
#define CPU_ST7920_DELAY_1 DELAY_0_NOP
#define CPU_ST7920_DELAY_2 DELAY_3_NOP
#define CPU_ST7920_DELAY_3 DELAY_0_NOP
#elif MB(MINIRAMBO) || MB(EINSYRAMBO)
#define CPU_ST7920_DELAY_1 DELAY_0_NOP
#define CPU_ST7920_DELAY_2 DELAY_4_NOP
#define CPU_ST7920_DELAY_3 DELAY_0_NOP
#elif MB(RAMBO)
#define CPU_ST7920_DELAY_1 DELAY_0_NOP
#define CPU_ST7920_DELAY_2 DELAY_0_NOP
#define CPU_ST7920_DELAY_3 DELAY_0_NOP
3 ms speedup for ST7920 and delay for BOARD_3DRAG and saving ~1k memory by limiting the `#pragma GCC optimize (3)` optimisation to `ultralcd_st7920_u8glib_rrd.h`. These optimisation was and is not done for all the other displays, is the reason for the big additionally use of memory, because the complete 'ultralcd.cpp' and 'dogm_lcd_implementation.h' was optimised (sadly i did not observe a change in speed). Unrolling the loop in `ST7920_SWSPI_SND_8BIT()`, what i expected the optimiser to do, by hand, saved some speed by eliminating the loop variable (i) compares and increases. Every CPU cycle in this loop costs at least 0.5ms per display update because it's executed more than 1k times/s. The delays are now pre-filled with the calculated values for 4.5V driven ST7920. A way to simply add __your__ timing into the configuration was made. At 4.5V 1.) The CLK signal needs to be at least 200ns high and 200ns low. 2.) The DAT pin needs to be set at least 40ns before CLK goes high and must stay at this value until 40ns after CLK went high. A nop takes one processor cycle. For 16MHz one nop lasts 62.5ns. For 20MHz one not lasts 50ns. To fulfill condition 1.) we need 200/62.5 = 3.2 => 4 cycles (200/50 = 4 => 4). For the low phase, setting the pin takes much longer. For the high phase we (theoretically) have to throw in 2 nops, because changing the CLK takes only 2 cycles. Condition 2.) is always fulfilled because the processor needs two cycles (100 - 125ns) for switching the CLK pin. Needs tests and feedback. Especially i cant test 20MHz, 3DRAG and displays supplied wit less than 5V. Are the delays right? Please experiment with longer or shorter delays. And give feedback. Already tested are 5 displays with 4.9V - 5.1V at 16MHz where no delays are needed.
8 years ago
#elif F_CPU == 16000000
#define CPU_ST7920_DELAY_1 DELAY_0_NOP
#define CPU_ST7920_DELAY_2 DELAY_0_NOP
#define CPU_ST7920_DELAY_3 DELAY_1_NOP
3 ms speedup for ST7920 and delay for BOARD_3DRAG and saving ~1k memory by limiting the `#pragma GCC optimize (3)` optimisation to `ultralcd_st7920_u8glib_rrd.h`. These optimisation was and is not done for all the other displays, is the reason for the big additionally use of memory, because the complete 'ultralcd.cpp' and 'dogm_lcd_implementation.h' was optimised (sadly i did not observe a change in speed). Unrolling the loop in `ST7920_SWSPI_SND_8BIT()`, what i expected the optimiser to do, by hand, saved some speed by eliminating the loop variable (i) compares and increases. Every CPU cycle in this loop costs at least 0.5ms per display update because it's executed more than 1k times/s. The delays are now pre-filled with the calculated values for 4.5V driven ST7920. A way to simply add __your__ timing into the configuration was made. At 4.5V 1.) The CLK signal needs to be at least 200ns high and 200ns low. 2.) The DAT pin needs to be set at least 40ns before CLK goes high and must stay at this value until 40ns after CLK went high. A nop takes one processor cycle. For 16MHz one nop lasts 62.5ns. For 20MHz one not lasts 50ns. To fulfill condition 1.) we need 200/62.5 = 3.2 => 4 cycles (200/50 = 4 => 4). For the low phase, setting the pin takes much longer. For the high phase we (theoretically) have to throw in 2 nops, because changing the CLK takes only 2 cycles. Condition 2.) is always fulfilled because the processor needs two cycles (100 - 125ns) for switching the CLK pin. Needs tests and feedback. Especially i cant test 20MHz, 3DRAG and displays supplied wit less than 5V. Are the delays right? Please experiment with longer or shorter delays. And give feedback. Already tested are 5 displays with 4.9V - 5.1V at 16MHz where no delays are needed.
8 years ago
#else
#error "No valid condition for delays in 'ultralcd_st7920_u8glib_rrd.h'"
#endif
#ifndef ST7920_DELAY_1
#define ST7920_DELAY_1 CPU_ST7920_DELAY_1
#endif
#ifndef ST7920_DELAY_2
#define ST7920_DELAY_2 CPU_ST7920_DELAY_2
#endif
#ifndef ST7920_DELAY_3
#define ST7920_DELAY_3 CPU_ST7920_DELAY_3
#endif
#define ST7920_SND_BIT \
WRITE(ST7920_CLK_PIN, LOW); ST7920_DELAY_1; \
WRITE(ST7920_DAT_PIN, val & 0x80); ST7920_DELAY_2; \
WRITE(ST7920_CLK_PIN, HIGH); ST7920_DELAY_3; \
val <<= 1
static void ST7920_SWSPI_SND_8BIT(uint8_t val) {
ST7920_SND_BIT; // 1
ST7920_SND_BIT; // 2
ST7920_SND_BIT; // 3
ST7920_SND_BIT; // 4
ST7920_SND_BIT; // 5
ST7920_SND_BIT; // 6
ST7920_SND_BIT; // 7
ST7920_SND_BIT; // 8
}
#if defined(DOGM_SPI_DELAY_US) && DOGM_SPI_DELAY_US > 0
#define U8G_DELAY() delayMicroseconds(DOGM_SPI_DELAY_US)
#else
#define U8G_DELAY() u8g_10MicroDelay()
#endif
#define ST7920_CS() { WRITE(ST7920_CS_PIN,1); U8G_DELAY(); }
#define ST7920_NCS() { WRITE(ST7920_CS_PIN,0); }
#define ST7920_SET_CMD() { ST7920_SWSPI_SND_8BIT(0xF8); U8G_DELAY(); }
#define ST7920_SET_DAT() { ST7920_SWSPI_SND_8BIT(0xFA); U8G_DELAY(); }
#define ST7920_WRITE_BYTE(a) { ST7920_SWSPI_SND_8BIT((uint8_t)((a)&0xF0u)); ST7920_SWSPI_SND_8BIT((uint8_t)((a)<<4u)); U8G_DELAY(); }
#define ST7920_WRITE_BYTES(p,l) { for (uint8_t i = l + 1; --i;) { ST7920_SWSPI_SND_8BIT(*p&0xF0); ST7920_SWSPI_SND_8BIT(*p<<4); p++; } U8G_DELAY(); }
uint8_t u8g_dev_rrd_st7920_128x64_fn(u8g_t *u8g, u8g_dev_t *dev, uint8_t msg, void *arg) {
uint8_t i, y;
switch (msg) {
case U8G_DEV_MSG_INIT: {
OUT_WRITE(ST7920_CS_PIN, LOW);
OUT_WRITE(ST7920_DAT_PIN, LOW);
OUT_WRITE(ST7920_CLK_PIN, HIGH);
ST7920_CS();
u8g_Delay(120); //initial delay for boot up
ST7920_SET_CMD();
#if defined(LULZBOT_LCD_CLEAR_WORKAROUND)
ST7920_WRITE_BYTE(0x20); //non-extended mode
ST7920_WRITE_BYTE(0x08); //display off, cursor+blink off
ST7920_WRITE_BYTE(0x01); //clear DDRAM ram
u8g_Delay(15); //delay for DDRAM clear
ST7920_WRITE_BYTE(0x24); //extended mode
ST7920_WRITE_BYTE(0x26); //extended mode + GDRAM active
#else
ST7920_WRITE_BYTE(0x08); //display off, cursor+blink off
ST7920_WRITE_BYTE(0x01); //clear CGRAM ram
u8g_Delay(15); //delay for CGRAM clear
ST7920_WRITE_BYTE(0x3E); //extended mode + GDRAM active
#endif
for (y = 0; y < (LCD_PIXEL_HEIGHT) / 2; y++) { //clear GDRAM
ST7920_WRITE_BYTE(0x80 | y); //set y
ST7920_WRITE_BYTE(0x80); //set x = 0
ST7920_SET_DAT();
for (i = 0; i < 2 * (LCD_PIXEL_WIDTH) / 8; i++) //2x width clears both segments
ST7920_WRITE_BYTE(0);
ST7920_SET_CMD();
}
ST7920_WRITE_BYTE(0x0C); //display on, cursor+blink off
ST7920_NCS();
}
break;
case U8G_DEV_MSG_STOP: break;
case U8G_DEV_MSG_PAGE_NEXT: {
uint8_t* ptr;
u8g_pb_t* pb = (u8g_pb_t*)(dev->dev_mem);
y = pb->p.page_y0;
ptr = (uint8_t*)pb->buf;
ST7920_CS();
for (i = 0; i < PAGE_HEIGHT; i ++) {
ST7920_SET_CMD();
if (y < 32) {
ST7920_WRITE_BYTE(0x80 | y); //y
ST7920_WRITE_BYTE(0x80); //x=0
}
else {
ST7920_WRITE_BYTE(0x80 | (y - 32)); //y
ST7920_WRITE_BYTE(0x80 | 8); //x=64
}
ST7920_SET_DAT();
ST7920_WRITE_BYTES(ptr, (LCD_PIXEL_WIDTH) / 8); //ptr is incremented inside of macro
y++;
}
ST7920_NCS();
}
break;
}
#if PAGE_HEIGHT == 8
return u8g_dev_pb8h1_base_fn(u8g, dev, msg, arg);
#elif PAGE_HEIGHT == 16
return u8g_dev_pb16h1_base_fn(u8g, dev, msg, arg);
#else
return u8g_dev_pb32h1_base_fn(u8g, dev, msg, arg);
#endif
}
uint8_t u8g_dev_st7920_128x64_rrd_buf[(LCD_PIXEL_WIDTH) * (PAGE_HEIGHT) / 8] U8G_NOCOMMON;
u8g_pb_t u8g_dev_st7920_128x64_rrd_pb = {{PAGE_HEIGHT, LCD_PIXEL_HEIGHT, 0, 0, 0}, LCD_PIXEL_WIDTH, u8g_dev_st7920_128x64_rrd_buf};
u8g_dev_t u8g_dev_st7920_128x64_rrd_sw_spi = {u8g_dev_rrd_st7920_128x64_fn, &u8g_dev_st7920_128x64_rrd_pb, &u8g_com_null_fn};
class U8GLIB_ST7920_128X64_RRD : public U8GLIB {
public:
U8GLIB_ST7920_128X64_RRD(uint8_t dummy) : U8GLIB(&u8g_dev_st7920_128x64_rrd_sw_spi) { UNUSED(dummy); }
};
#if ENABLED(LULZBOT_LIGHTWEIGHT_UI)
typedef const __FlashStringHelper *progmem_str;
// We have to include the code for the lightweight UI here
// as it relies on macros that are only defined in this file.
#include "status_screen_lite_ST7920_spi.h"
#endif
#pragma GCC reset_options
#endif // ULCDST7920_H